How to Convert Delta Parquet Files to a Single Parquet File with Latest Version of Delta

您所在的位置:网站首页 spark version How to Convert Delta Parquet Files to a Single Parquet File with Latest Version of Delta

How to Convert Delta Parquet Files to a Single Parquet File with Latest Version of Delta

#How to Convert Delta Parquet Files to a Single Parquet File with Latest Version of Delta| 来源: 网络整理| 查看: 265

Hello @Richards, Sam (DG-STL-HQ),

Welcome to the MS Q&A platform.

To convert Delta Parquet files to a single Parquet file with the latest version of Delta, you can use Apache Spark and Delta Lake.

Load the Delta Parquet files into a Spark DataFrame

df = spark.read.format("delta").load(delta_table_path)

df.show()

Get the latest version of the Delta table:

delta_table = DeltaTable.forPath(spark, delta_table_path)

df = delta_table.toDF()

df.show()

Filter the DataFrame to include only the latest version:

df = df.filter("version = (SELECT max(version) from delta_table_path)")

df.show()

Write out the DataFrame as a single Parquet file:

df.write.parquet("parquet.delta_table_path", mode="overwrite")

If you have the plain parquet files(not using delta lake format), then you can use the below Apache spark python script to convert the plain parquet files in the folder to a single delta lake format.

%%pyspark from delta.tables import DeltaTable deltaTable = DeltaTable.convertToDelta(spark, "parquet.delta_table_path")

Reference documents:

https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/spark/apache-spark-delta-lake-overview.md

https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/sql/query-delta-lake-format.md

I hope this helps. Please let us know if you have any further questions.



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3